National Institute of Standards and Technology logo

API | FAQ | CONOPS Document | irex@nist.gov

Introduction

The IREX 10: Identification Track assesses iris recognition performance for identification (a.k.a one-to-many) applications. Most flagship deployments of iris recognition operate in identification mode, providing services ranging from prison management, border security, expedited processing, and distribution of resources. Administered at the Image Group’s Biometrics Research Lab (BRL), developers submit their matching software for testing over sequestered iris data. As an ongoing evaluation, developers may submit at any time.

Leaderboard

Accuracy Metric : FNIR (i.e., “miss rate”) at an FPIR of 0.01(± 90% confidence)
Dataset: Operational Dataset 4th pull
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: One enrollment session per person

The number after the ± indicates either the 90% confidence interval (for accuracy) or the standard deviation (for times and sizes).

Detection-Error Trade-off Plots

Core accuracy for the identification task can be characterized by Detection-error trade-off (DET) plots. Generally, curves lower down in a DET plot correspond to more accurate matchers. The plots are interactive through the use of the Plotly.js graphing library.

Dataset: Operational Dataset 4th pull
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: One enrollment session per person
Dataset: Operational Dataset 4th pull
Samples used: One eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One enrollment session per person

Rank Accuracy

Rank-based metrics are general better at reflecting performance for investigational tasks, where the algorithm returns a list of candidates for an inspector to further scrutinize. The rank 10 “hit rate” is the fraction of searches that return the correct candidate within the top 10 candidates. The miss rate is one minus the hit rate.

Dataset: Operational Dataset 4th pull
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: One enrollment session per person

Computation Time

Computation times are measured as the the elapsed real time (i.e., “wall clock” time) as opposed to CPU time. Timing estimates were computed on unloaded machines with only a single process dedicated to biometric operations. The test machines are Dell PowerEdge M910 blades with Dual Intel Xeon X7560 2.3 GHz CPUs (with eight cores per processor).



Dataset: Operational Dataset 4th pull
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: One enrollment session per person


Previous IREX evaluations identified a speed-accuracy trade-off whereby the more accurate matchers tend to take longer to return search results. The plot below shows FNIR as a function of median search time for each matcher. FNIR computed at an FPIR of \(0.01\).

Dataset: Operational Dataset 4th pull
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: One enrollment session per person

Demographics

Idemia

The figure below shows the FNIR at FPIR=0.01 (t = 2700) for different demographic groups. The bars show 95% confidence intervals.

Dataset: Operational Dataset 4th pull
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: One enrollment session per person

Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.

Logistic Regression

This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is

= log p 1-p = β0 + β1 Sex + β2 Race + β3 Eye color + β4 Sex and Eye Color + β5 Sex and Race

where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.

n = 312,114
McFadden’s R2 = 0.0000437

Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo R2 is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.

The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.

Demographic breakdown of false positives

The figures below show the demographic composition of false positives. Most false positives are likely to be white and male since they comprise the majority of the test data. The precise demographic breakdown of the test data is
  • 68.5% White and male
  • 16.8% White and female
  • 12.8% Black and male
  • 1.9% Black and female
and
  • 85.5% Dark eyes
  • 14.5% Light eyes

Other races, sexes, and eye colors are ignored due to their infrequent occurrence.


Demographic breakdown of hits

Neurotechnology

The figure below shows the FNIR at FPIR=0.01 (t = 0.00917) for different demographic groups. The bars show 95% confidence intervals.

Dataset: Operational Dataset 4th pull
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: One enrollment session per person

Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.

Logistic Regression

This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is

= log p 1-p = β0 + β1 Sex + β2 Race + β3 Eye color + β4 Sex and Eye Color + β5 Sex and Race

where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.

n = 312,114
McFadden’s R2 = 0.0000437

Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo R2 is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.

The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.

Demographic breakdown of false positives

The figures below show the demographic composition of false positives. Most false positives are likely to be white and male since they comprise the majority of the test data. The precise demographic breakdown of the test data is
  • 68.5% White and male
  • 16.8% White and female
  • 12.8% Black and male
  • 1.9% Black and female
and
  • 85.5% Dark eyes
  • 14.5% Light eyes

Other races, sexes, and eye colors are ignored due to their infrequent occurrence.


Demographic breakdown of hits

NEC

The figure below shows the FNIR at FPIR=0.01 (t = 12295) for different demographic groups. The bars show 95% confidence intervals.

Dataset: Operational Dataset 4th pull
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: One enrollment session per person

Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.

Logistic Regression

This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is

= log p 1-p = β0 + β1 Sex + β2 Race + β3 Eye color + β4 Sex and Eye Color + β5 Sex and Race

where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.

n = 312,114
McFadden’s R2 = 0.0000437

Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo R2 is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.

The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.

Demographic breakdown of false positives

The figures below show the demographic composition of false positives. Most false positives are likely to be white and male since they comprise the majority of the test data. The precise demographic breakdown of the test data is
  • 68.5% White and male
  • 16.8% White and female
  • 12.8% Black and male
  • 1.9% Black and female
and
  • 85.5% Dark eyes
  • 14.5% Light eyes

Other races, sexes, and eye colors are ignored due to their infrequent occurrence.


Demographic breakdown of hits

EyeCool

The figure below shows the FNIR at FPIR=0.01 (t = 0.231) for different demographic groups. The bars show 95% confidence intervals.

Dataset: Operational Dataset 4th pull
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: One enrollment session per person

Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.

Logistic Regression

This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is

= log p 1-p = β0 + β1 Sex + β2 Race + β3 Eye color + β4 Sex and Eye Color + β5 Sex and Race

where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.

n = 312,114
McFadden’s R2 = 0.0000437

Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo R2 is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.

The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.

Demographic breakdown of false positives

The figures below show the demographic composition of false positives. Most false positives are likely to be white and male since they comprise the majority of the test data. The precise demographic breakdown of the test data is
  • 68.5% White and male
  • 16.8% White and female
  • 12.8% Black and male
  • 1.9% Black and female
and
  • 85.5% Dark eyes
  • 14.5% Light eyes

Other races, sexes, and eye colors are ignored due to their infrequent occurrence.


Demographic breakdown of hits

SOAR

The figure below shows the FNIR at FPIR=0.01 (t = 0.529) for different demographic groups. The bars show 95% confidence intervals.

Dataset: Operational Dataset 4th pull
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: One enrollment session per person

Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.

Logistic Regression

This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is

= log p 1-p = β0 + β1 Sex + β2 Race + β3 Eye color + β4 Sex and Eye Color + β5 Sex and Race

where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.

n = 312,114
McFadden’s R2 = 0.0000437

Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo R2 is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.

The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.

Demographic breakdown of false positives

The figures below show the demographic composition of false positives. Most false positives are likely to be white and male since they comprise the majority of the test data. The precise demographic breakdown of the test data is
  • 68.5% White and male
  • 16.8% White and female
  • 12.8% Black and male
  • 1.9% Black and female
and
  • 85.5% Dark eyes
  • 14.5% Light eyes

Other races, sexes, and eye colors are ignored due to their infrequent occurrence.


Demographic breakdown of hits

Dermalog

The figure below shows the FNIR at FPIR=0.01 (t = 68) for different demographic groups. The bars show 95% confidence intervals.

Dataset: Operational Dataset 4th pull
Samples used: Both eyes
Enrolled Population: 500K people
Enrollment Method: One enrollment session per person

Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.

Logistic Regression

This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is

= log p 1-p = β0 + β1 Sex + β2 Race + β3 Eye color + β4 Sex and Eye Color + β5 Sex and Race

where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.

n = 312,114
McFadden’s R2 = 0.0000437

Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo R2 is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.

The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.

Demographic breakdown of false positives

The figures below show the demographic composition of false positives. Most false positives are likely to be white and male since they comprise the majority of the test data. The precise demographic breakdown of the test data is
  • 68.5% White and male
  • 16.8% White and female
  • 12.8% Black and male
  • 1.9% Black and female
and
  • 85.5% Dark eyes
  • 14.5% Light eyes

Other races, sexes, and eye colors are ignored due to their infrequent occurrence.


Demographic breakdown of hits

Automated Quality Assessment

Some of the participant’s submissions output estimates of sample quality for each processed iris image. The ANSI/NIST-ITL 1-2011 standard requires these estimates to be in the range 0 to 100 and to quantitatively express the predicted matching performance of the sample. Error-reject rate curves show how FNIR can be reduced by discarding the poorest quality samples in the test data. In our case, the quality of a search was set to the minimum quality assigned to the searched image and its enrolled mate.

The figure below demonstrates that FNIR (i.e. the ‘miss rate’) can be reduced by almost 20% by discarding just 1% of the poorest quality searches. Presumably, this 1% involved samples where the subject was blinking, moving, looking off-axis at the moment of capture, etc. The IREX III supplemental failure analysis found that matching failures for the most accurate matchers over a different dataset were almost entirely due to poor presentation of the iris.

Dataset: Operational Dataset 4th pull
Samples used: One eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One enrollment session per person


The stacked barplot below shows how sample quality impacts the probability that a search will miss (i.e. fail to return the correct mate). Samples assigned low quality values should be more likely to miss. For Neurotechnology’s matcher, when the assigned value is 0 the probability of a miss is greater than 50%. FPIR is set to \(0.01\).

Dataset: Operational Dataset 4th pull
Samples used: One eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One enrollment session per person

The sample quality of left and right iris images acquired during the same session are expected to be highly correlated. In addition to having similar capture environments, dual-eye cameras acquire both images at nearly the same instant so poor presentation of the irides at the moment of capture (e.g. blinking or moving at the moment of capture) detrimentally affects both images. For this reason, matching both acquired images vs. matching just one yields only a moderate improvement in accurary. The figure below shows the distribution of qualities with each axis represneting the quality of one of the iris images (left or right) acquired during the same capture session.

Dataset: Operational Dataset 4th pull
Samples used: One eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One enrollment session per person


The acquisition protocol for OPS4 images has probably improved over time. Better iris cameras and capture environments are likely to have improved the quality of the acquired images. Iris recognition accuracy is highly dependent on the prevalence of very poor quality samples. Misses tend to occur when the subject was blinking, moving, looking off-axis (etc.) at the instant of capture. The figure below shows the prevalence of these very low quality samples in OPS4 for each capture year. Comparatively few images in OPS4 were collected prior to 2014 so results for these images are omitted. An iris sample was deemed to have very low quality if its quality value is among the lowest 2% (i.e. below the 2% quantile) of all images in OPS4.

Dataset: Operational Dataset 4th pull
Samples used: One eye
Enrolled Population: 1M irides (500K people)
Enrollment Method: One enrollment session per person

Algorithm Fusion

Combining the results from multiple submissions sometimes yields improved accuracy over individual submissions. In this section score-level fusion is used to combine search results from multiple submissions. Equal-weighted Neyman-Pearson fusion is used to merge candidate lists from different submissions into a single consolidated candidate list. The dissimilarity score associated with each candidate is normalized prior to fusion (see LFAR score). This normalized score is a measure of similarity rather than dissimilarity. Any candidate appearing on multiple lists is assigned a single fused score by summing the the individual LFAR scores. The merged candidate list is then reordered by the LFAR scores.

Only fusion results that yield an improvement in accuracy over the individual submissions are shown.

Twins Dataset

Idemia

Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.

The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.

Histograms of Score Distributions

Cummulative Score Distributions

The shading shows an estimate of the 95% confidence interval for twins comparisons.

One-to-many Matching

The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.

Number of times twin appears at rank 10 or above: 0

Samples used: One eye
Enrolled Population: 5,078 iris images
Number of Searches: 5,078

Neurotechnology

Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.

The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.

Histograms of Score Distributions

Cummulative Score Distributions

The shading shows an estimate of the 95% confidence interval for twins comparisons.

One-to-many Matching

The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.

Number of times twin appears at rank 10 or above: 19

Samples used: One eye
Enrolled Population: 5,078 iris images
Number of Searches: 5,078

NEC

Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.

The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.

Histograms of Score Distributions

Cummulative Score Distributions

The shading shows an estimate of the 95% confidence interval for twins comparisons.

One-to-many Matching

The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.

Number of times twin appears at rank 10 or above: 2

Samples used: One eye
Enrolled Population: 5,078 iris images
Number of Searches: 5,078

EyeCool

Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.

The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.

Histograms of Score Distributions

Cummulative Score Distributions

The shading shows an estimate of the 95% confidence interval for twins comparisons.

One-to-many Matching

The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.

Number of times twin appears at rank 10 or above: 3

Samples used: One eye
Enrolled Population: 5,078 iris images
Number of Searches: 5,078

SOAR

Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.

The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.

Histograms of Score Distributions

Cummulative Score Distributions

The shading shows an estimate of the 95% confidence interval for twins comparisons.

One-to-many Matching

The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.

Number of times twin appears at rank 10 or above: 3

Samples used: One eye
Enrolled Population: 5,078 iris images
Number of Searches: 5,078

Dermalog

Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.

The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.

Histograms of Score Distributions

Cummulative Score Distributions

The shading shows an estimate of the 95% confidence interval for twins comparisons.

One-to-many Matching

The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.

Number of times twin appears at rank 10 or above: 2

Samples used: One eye
Enrolled Population: 5,078 iris images
Number of Searches: 5,078

How to Participate

Participation is open to any commercial or academic organization free of charge. The first step is to mail the signed Participation Agreement and Data Transfer Agreement to NIST. Instructions on building a submission can be found in the concept of operations (CONOPS) document. The CONOPS document is supplemented by the irex.h and structs.h. To assist with development, a minimal working “stub” (a.k.a null implementation) is also available. See also our FAQ.

Participants are allowed to submit an implementation once every 3 calendar months.

Please send comments and recommendations to irex@nist.gov.

Contact Info

Inquiries and comments may be submitted to irex@nist.gov. Subscribe to the IREX mailing list to stay up-to-date on all IREX-related activities.